Two Algorithms for Inducing Structural Equation Models from Data
نویسندگان
چکیده
We present two algorithms for inducing structural equation models from data. Assuming no latent variables, these models have a causal interpretation and their parameters may be estimated by linear multiple regression. Our algorithms are comparable with PC [15] and IC [12, 11], which rely on conditional independence. We present the algorithms and empirical comparisons with PC and IC. This research is supported by ARPA/Rome Laboratory under contract #'s F30602-91-C-0076 and F306023-93C-0100; and by a NASA GSRP Training Grant, NGT 70358. 1. Structural Equation Models Given a dependent variable x0 and a set of predictor variables P = fx1; x2; : : : ; xkg, multiple regression algorithms nd subsets p P that account for \much" of the variance in x0. These are search algorithms, and they are not guaranteed to nd the \best" p|the one that makes R as big as possible, especially when p is a proper subset of P [10, 3, 7]. The question arises, what should be done with the variables in T = P p, the ones that aren't selected as predictors of x0? In many data analysis tasks, the variables in T are used to predict the variables in p. For instance, we might select x1 and x3 as predictors of x0; and x2; x5; x6 as predictors of x1; and x4 as a predictor of x2 and x3; and so on. We can write structural equations: x0 = 0;1x1 + 0;3x3 + u x1 = 1;2x2 + 1;5x5 + 1;6x6 + v x2 = 2;4x4 + w x3 = 3;4x4 + z The principal task for any modeling algorithm is to decide, for a given \predictee," which variables should be in p and which should be in T . Informally, we must decide where in a structural equation model a variable does most good. For example, parents' education (PE) and child's education (CE) could be used as predictors of a child's satisfaction when he or she takes a job (JS), but we might prefer a model in which PE predicts CE, and CE predicts JS (or a model in which PE predicts CE and JS). This paper presents two algorithms that build structural equation models. There are clear parallels between building structural equation models and building causal models. Indeed, path analysis refers to the business of interpreting structural equation models as causal models [9, 16]. Path analysis has been heavily criticized (e.g., [13, 8]) in part because latent variables can produce large errors in estimated regression coe cients throughout a model [15]. Recent causal induction algorithms rely not on regression coe cients but on conditional independence [11, 15]. These algorithms use covariance information only to infer boolean conditional independence constraints; they do not estimate strengths of causal relationships, and, most importantly from our perspective, they don't use these strengths to guide the search for causal models. Our algorithms, called fbd and ftc, use covariance information, in the form of estimated standardized regression coe cients, to direct the construction of structural equation models and to estimate the parameters of the models. Because latent variables can result in biased estimates, our algorithms might be misled when latent variables are at work. In practice, fbd and ftc are more robust than, say, stepwise multiple regression. They often discard predictors that are related to the predictee only through the presence of a latent variable [2]. We haven't yet shown analytically why the algorithms have this advantage. Until we do, our only claim for fbd and ftc is this: when latent variables are at work, our algorithms build multilevel regression models of heuristic value to analysts, just as ordinary regression algorithms build useful (but suspect) single-level models. If we can assume causal su ciency [15]|essentially, no latent variables|these models may be interpreted as causal models [5].
منابع مشابه
In Preliminary Papers of the Fifth International Workshop on Arti cial Intelligence
We present two algorithms for inducing structural equation models from data. Assuming no latent variables, these models have a causal interpretation and their parameters may be estimated by linear multiple regression. Our algorithms are comparable with PC 15] and IC 12, 11], which rely on conditional independence. We present the algorithms and empirical comparisons with PC and IC.
متن کامل1Two Algorithms for Inducing Structural Equation Models from Data
We present two algorithms for inducing structural equation models from data. Assuming no latent variables, these models have a causal interpretation and their parameters may be estimated by linear multiple regression. Our algorithms are comparable with PC Spirtes93] and IC Pearl91a, Pearl91b], which rely on conditional independence. We present the algorithms and empirical comparisons with PC an...
متن کاملFifth International Workshop on Arti cial Intelligence and Statistics
We present two algorithms for inducing structural equation models from data. Assuming no latent variables, these models have a causal interpretation and their parameters may be estimated by linear multiple regression. Our algorithms are comparable with PC 15] and IC 12, 11], which rely on conditional independence. We present the algorithms and empirical comparisons with PC and IC. Given a depen...
متن کاملOptimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network
Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...
متن کاملStructural Equation Modeling (SEM) in Health Sciences Education Researches: An Overview of the Method and Its Application
Introduction: There are many situations through which researchers of human sciences particularly in health sciences education attempt to assess relationships of variables. Moreover researchers may be willing to assess overall fit of theoretical models with the data emerged from the study population. This review introduces the structural equation models method and its application in health scien...
متن کامل